32 research outputs found

    Computing Competencies for Undergraduate Data Science Curricula: ACM Data Science Task Force

    Get PDF
    At the August 2017 ACM Education Council meeting, a task force was formed to explore a process to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force would seek to define what the computing/computational contributions are to this new field, and provide guidance on computing-specific competencies in data science for departments offering such programs of study at the undergraduate level. There are many stakeholders in the discussion of data science – these include colleges and universities that (hope to) offer data science programs, employers who hope to hire a workforce with knowledge and experience in data science, as well as individuals and professional societies representing the fields of computing, statistics, machine learning, computational biology, computational social sciences, digital humanities, and others. There is a shared desire to form a broad interdisciplinary definition of data science and to develop curriculum guidance for degree programs in data science. This volume builds upon the important work of other groups who have published guidelines for data science education. There is a need to acknowledge the definition and description of the individual contributions to this interdisciplinary field. For instance, those interested in the business context for these concepts generally use the term “analytics”; in some cases, the abbreviation DSA appears, meaning Data Science and Analytics. This volume is the third draft articulation of computing-focused competencies for data science. It recognizes the inherent interdisciplinarity of data science and situates computing-specific competencies within the broader interdisciplinary space

    Extraction and Use of Contextual Attributes for Theory Completion: An Integration of Explanation-Based and Similarity-Based Learning

    No full text
    Extraction and Use of Contextual Attributes for Theory Completion: An Integration of Explanation-Based and Similarity-Based Learning Andrea Pohoreckyj Danyluk This research investigates the use of contextual cues to address problems in machine learning that arise from assumptions about the initial knowledge that is necessary for the acquisition of new information. Machine learning approaches may be placed along a spectrum describing purely inductive to purely deductive techniques. Inductive systems possess essentially no explicit knowledge that can be used in acquiring new facts, while deductive systems are assumed to contain a complete theory of the domain. Most work in machine learning has concentrated on approaches at the two ends of the spectrum. This dissertation describes an approach that integrates inductive and deductive methods. It provides a mechanism by which induction can be used in order to detect and acquire knowledge missing from the domain theory of a deductive sys..

    A Comparison of Data Sources for Machine Learning in a Telephone Trouble Screening Expert System

    No full text
    This paper describes a domain where the application of machine learning, specifically inductive learning, could have enormous positive impact. The domain possesses attributes that would indicate that inductive learning would easily succeed for this domain. In particular, data for this domain are abundant. In spite of this, numerous machine learning methods -- both inductive and otherwise -- have failed to learn a knowledge base having high accuracy. This paper presents a comparison of the data sources available for this domain. It focuses primarily on a survey system that was ultimately designed for the purpose of collecting data best suited to this task. Keywords: knowledge acquisition for expert systems; knowledge elicitation; data collection; data collection interfaces This research was performed while the author was an employee of NYNEX Science and Technology, Inc. 1 Introduction Many machine learning techniques, most notably inductive methods, rely upon data from which they ..

    Artificial Intelligence Competencies for Data Science Undergraduate Curricula

    No full text
    In August 2017, the ACM Education Council initiated a task force to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force is seeking to define what the computing contributions are to this new field, in order to provide guidance for computer science or similar departments offering data science programs of study at the undergraduate level. The ACM Data Science Task Force has completed the initial draft of a curricular report. The computing-knowledge areas identified in the report are drawn from across computing disciplines and include several sub-areas of AI. This short paper describes the overall project, highlights AI-relevant areas, and seeks to open a dialog about the AI competencies that are to be considered central to a data science undergraduate curriculum

    Off-Topic Detection in Conversational Telephone Speech

    No full text
    In a context where information retrieval is extended to spoken “documents ” including conversations, it will be important to provide users with the ability to seek informational content, rather than socially motivated small talk that appears in many conversational sources. In this paper we present a preliminary study aimed at automatically identifying “irrelevance ” in the domain of telephone conversations. We apply a standard machine learning algorithm to build a classifier that detects offtopic sections with better-than-chance accuracy and that begins to provide insight into the relative importance of features for identifying utterances as on topic or not.

    Problem Definition, Data Cleaning, and Evaluation: A Classifier Learning Case Study

    No full text
    This paper is a case study of this process based on a long-term project addressing the automatic dispatch of technicians to fix faults in the local loop of a telephone network. The bottom line of the project is that simple learning techniques can be effective. However, constructing a convincing argument to that effect is far from simple. In particular, we had to consult multiple sources to obtain class labels, use domain knowledge to clean up data, compare with existing methods, and evaluate with data from multiple locations. Finally, it was necessary to use decision-analytic techniques to evaluate the cost-effectiveness of the learned classifiers, because evaluation based on classification accuracy is misleading without an analysis of cost-effectiveness. Our view is that application studies should be helpful in guiding future research. Therefore, we conclude by outlining useful directions suggested by our experience on this long-term project. 1 Introductio

    Telecommunications Network Diagnosis

    Get PDF
    The Scrubber 3 system monitors problems in the local loop of the telephone network, making automated decisions on tens of millions of cases a year, many of which lead to automated actions. Scrubber saves Bell Atlantic millions of dollars annually, by reducing the number of inappropriate technician dispatches. Scrubber's core knowledge base, the Trouble Isolation Module (TIM), is a probability estimation tree constructed via several data mining processes. TIM currently is deployed in the Delphi system, which serves knowledge to multiple applications. As compared to previous approaches, TIM is more general, more robust, and easier to update when the network or user requirements change. Under certain circumstances it also provides better classifications. In fact, TIM's knowledge is general enough that it now serves a second deployed application. One of the most interesting aspects of the construction of TIM is that data mining was used not only in the traditional sense, namely, building a model from a warehouse of actual historical cases. Data mining also was used to produce an understandable model of the knowledge contained in an earlier, successful diagnostic system.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
    corecore